Bayesian Clustering with Priors on Partitions

نویسنده

  • Tim B. Swartz
چکیده

Traditional clustering algorithms are deterministic in the sense that a given dataset always leads to the same output partition. This paper modifies traditional clustering algorithms whereby data is associated with a probability model, and clustering is carried out on the stochastic model parameters rather than the data. This is done in a principled way using a Bayesian approach which allows the assignment of posterior probabilities to output partitions. In addition, the approach incorporates prior knowledge of the output partitions using Bayesian melding. The methodology is applied to two substantive problems; (1) a question of stylometry involving a simulated dataset and (2) the assessment of potential champions of the 2010 FIFA World Cup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Video Segmentation by Bayesian Split-Merge Clustering

We present an online video segmentation algorithm based on a novel nonparametric Bayesian clustering method called Bayesian Split-Merge Clustering (BSMC). BSMC can efficiently cluster dynamically changing data through split and merge processes at each time step, where the decision for splitting and merging is made by approximate posterior distributions over partitions with Dirichlet Process (DP...

متن کامل

Nonparametric Bayesian Co-clustering Ensembles

A nonparametric Bayesian approach to co-clustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base co-clustering results to obtain a more robust consensus co-clustering. To avoid pre-specifying the number of co-clusters, we specify independent Dirichlet process priors for the row and column clusters. Thus, the numbers of rowand column-cluster...

متن کامل

Bayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models

Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...

متن کامل

On a class of random probability measures with general predictive structure

In this paper we investigate a recently introduced class of nonparametric priors, termed generalized Dirichlet process priors. Such priors induce (exchangeable random) partitions which are characterized by a more elaborate clustering structure than those arising from other widely used priors. A natural area of application of these random probability measures is represented by species sampling p...

متن کامل

Bayesian Inference for Spatial Beta Generalized Linear Mixed Models

In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011